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Abstract: Wilkinson conversion of stored samples in large Switch Capacitor Array (SCA) 
ASICs, such as used for high speed waveform sampling, has many benefits in terms of compact- 
ness, no missing output codes, low power requirements and robustness. However such Analog-to- 
Digital conversions are relatively slow, limited by the encoder clock speed. By repeating the same 
fast sampling technique used by the SCA, combined with a fast priority encoder, significantly faster 
conversion is demonstrated for a prototype ASIC designated PROl. For 8-10 bits of resolution, this 
technique is compact and requires far fewer system resources. 
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1. 


Background 





Precision timing and amplitude instrumentation of large arrays of photo detector elements for fu- 
ture applications in collider and astroparticle physics has been enabled the proliferation of high- 
performance and low-cost waveform sampling devices [|l|, 0, ||, ^l- To expand this technique in 
a cost-effective manner to systems consisting of 0.1-1 million channels, certain features could be 
quite useful. In particular, next-generation TeV gamma and Super B-factory detector applications 
require trigger rates of lO's of kHz while providing multi-buffer capability. This requirement places 
a premium on analog conversion performance. 

We present the results of an ASIC developed for the flash encoding of photodetector signals, 
a number of methods of which have been evaluated [^. This concept is an outgrowth of earlier 
work ^ and is illustrated in Fig. [l] part a) where the leading and trailing edge times are used 
to determine the timing and Time-Over-Threshold (TOT) for an analog waveform. A high level 
threshold crossing may be used to improve the intrinsic time determination error due to amplitude 
dependence ("time walk"). Part b) of the figure illustrates the flash time encoding of the digital 
output of a ramp (Wilkinson) comparator. 



a) 
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Figure 1. Flash encoding concept: a) is the analog waveform recording with leading edge LL and trailing 
edge XL for XOX and high level HL for coarse Xime Walk Correction; b) simplified scheme for fast transition 
timing edge encoding. 
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Figure 2. A block diagram of the PROl readout, where only the Low Level (LL) threshold crossing output 
is considered. A compact cascade implementation for the priority encoder limited the settling time, and can 
be improved. 



2. Architectural Details 

Xhe Photodector Read Out version 1 (PROl) ASIC was developed to evaluate the use of wave- 
form sampling in conjunction with threshold crossing encoding to provide flash, although coarse, 
determination of signal pulse parameters. Relatively slow risetime signals, combined with channel- 
channel comparator threshold spreads, resulted in limited performance using this technique; at least 
for compact arrays and precision timing (sub lOOps resolution) applications. However for the en- 
coding of fast Wilkinson comparator outputs, the recording concept illustrated in Fig. lb) shows 
promise to improve upon limitations of the high-speed Gray Code Counter (GCC) scheme usually 
employed [0]. In this fast Wilkinson technique, a ramp is started coincident in time with the prop- 
agation of a write-pointer strobe across the sampling array. Xhe comparator output transition time 
is analog captured and each sample evaluated with a low-power comparator. Xhe power required 
to operate each 8-bit sampling row is about 9 mW, and could be lowered further by disabling the 
comparator bias when conversion cycle is completed. 

A priority encoder determines the location of the first threshold crossing cell. Xhis signal flow 
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is illustrated in Fig. Typical effective times between samples in the array of lOO's of ps (many 
GSa/s effective) are common in 0.25-0.35/im CMOS processes [|l|, ^. Obtaining similar GHz 
digital counter rates in a companion FPGA is either difficult or very power and routing resource 
intensive. 

Table [l| shows the system requirements for a Xilinx FPGA functioning as a Time-to-Digital 
Converter (TDC) using a high speed digital reference clock and a GCC. A proper pipelined, dual 
clock phase GCC was used when simulating the amount of FPGA logic slices, flip flops, and Look- 
up Tables (LUTs) required. A proper pipelined GCC is a Gray Code counter that counts in Gray 
Code using an array of pipelined flip flops. Each PROl has the following specifications, relevant 
to application as a TDC, as listed in Table 

Table 1. Programmable logic system requirements for a Xilinx Virtex or Spartan FPGA functioning as a 
TDC using a high speed digital reference clock and a GCC. 
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Fig. ^ shows a photograph of the PROl bare die. As noted earlier, additional circuitry exists for 
prototyping other functionality. For the measurement reported, only a single channel and storage 
row are considered, and only the Low-level comparator output thereof. When implementing only 
that functionality, the density clearly can and will be increased, though a constraint is provided by 
the need to reduce the settling time of the select logic tree employed. This logic can take as long as 
the write-pointer propagation time across the array to settle (lOO's of ns). 

3. Readout Test System 

A printed circuit board was fabricated to evaluate PROl performance, a photograph of which which 
is shown in Fig. 0. The three main components on this circuit board are a packaged PROl ASIC, 
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Table 2. Relevant sampling specification for the PROl ASIC when used as a flash TDC. Measurements from 
a single channel, single storage row, consisting of 256 samples are presented. 



Parameter 


Value 


Unit 


Sampling Rate 


1.0-2.7 


GSa/s 


Full range 


95-250 


ns 


Nominal Time Step 


400 


ps (2.5 GSa/s) 


Number of inputs 


4 


channels per PROl 


Sample rows 


4 


per channel 


Encoding output 


8 


bits 




Figure 3. A bare die photograph of the PROl ASIC. The die is 3.21mm by 3.03mm and is fabricated in the 
TSMC 0.25 Aim process. 

an FPGA, and a Universal Serial Bus (USB) interface chip. The external communication interface 
is via USB 2.0 and the Cypress CY7C68013-56PVC USB microcontroller is used. This USB 
microcontroller controls the data being sent to and received from the FPGA to a computer interface. 
The FPGA controls the digital logic and timing for the PROl readout, and the XiUnx XC3S200 is 
used. RAM banks internal to the FPGA buffer the digitized values while the data is being dumped 
into the USB data stream. A basic software tool was developed to send commands to the FPGA 
and record PROl data via the USB 2.0 interface. 

4. Test Results 

By using the readout system described in the previous section, a number of the basic perfor- 
mance parameters of the PROl ASIC were evaluated. Because timing performance is such Ver. 1.2 
2008/1 1/03 a critical feature of this ASIC functioning as a TDC for Wilkinson conversion, each 
parameter is described in detail in the following subsequent sections. 
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Figure 4. Photograph of the PROl evaluation circuit board 

4.1 Sampling speed 

Determination of the sampling speed is made by measuring the time interval between insertion of 
the timing strobe and appearance of the output pulse from the last cell of the row, minus pad buffer 
delays. The sampling speed is calculated by taking the number of cells in a row and dividing it 
by the propagation time for a given control voltage setting. A plot of the sampling speed versus 
control voltage (ROVDD) is shown in Fig. ^ where it is seen that sampling rates from below 0.3 
GSa/s to above 4.5 GSa/s are possible. 

4.2 Temperature Dependence 

One potential disadvantage of this voltage controlled delay technique is that the circuit is tem- 
perature dependent. This dependence is seen in Fig. ^ and is roughly 0.3%/°C, around room 
temperature, and completely matches expectation from SPICE simulation. While for many appli- 
cations, this variation would not be significant and can be calibrated out with an external reference 
clock [§]. 

4.3 Timing Performance 

The PROl timing performance was evaluated using the test setup shown in Fig. ^ A synchroniza- 
tion pulse from the evaluation circuit board goes through an RF splitter. One copy of the sync pulse 
is fed back with fixed cable delay into the evaluation board to create the sampling strobe. The other 
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Figure 5. Sampling rate as a function of the ROVDD control voltage, where extended operation ( 2.5V) is 
possible. 
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Figure 6. Temperature dependence of the sampling rate. 
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Figure 7. Schematic of the PROl timing measurement. 
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copy is used to trigger a Avtech pulser. The pulse from Avtech pulser was tuned to a 500 mV am- 
plitude with a 0.5 ns rise time and 10 ns full width half max (FWHM) duration. The discriminator 
on the PROl ASIC was set to trigger on the pulse's rising edge at 250 mV. The Avtech AVMP-2- 
C-P-EPIA pulser was used and its output was inserted into the RF input of the PROl ASIC and 
scanned across the sampling window using a variable delay module. 




PR01 Output 

Figure 8. Plot showing the linear response of the PROl output with respect to a fixed time displacement. 
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Figure 9. Plot of the residual data structure from subtracting the linear fit to the data points. 

By scanning the sampling window with this test setup, the linear response of the ASIC's digital 
output versus the time difference is shown in Fig. ||. Each data point for Fig. || is the average PROl 
output for 10k events. The slope of the linear fit shows that the average time step to be 373 ps 
between sampling pixels with ROVDD tied to the ASIC's VDD. The substraction of the linear fit 
to the data points is shown in Fig. ^. The structure in the residual plot can be used to create a 
bin-by-bin correction to improve PROl ASIC timing. The projection of the all residual events is 



shown in Fig. |10|, which has an RMS timing jitter of about 673ps. By applying a bin-by-bin timing 



calibration to the PROl ASIC's digital output, the RMS timing jitter is reduced down to 163ps, 



which is shown in Fig. [11|. This TDC performance, after applying the calibration corrections, is not 
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Figure 10. Histogram showing the timing jitter with no caHbrations. 
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Figure 11. Histogram showing the timing jitter with caHbrations. 



so far from the ideal binary interpolation limit (108ps). The additional timing error is attributed 
to sampling speed temperature drift and storage cell dependent comparator threshold dispersion. 



5. ADC Implementation 

By using this circuit for Wilkinson conversion, a calibration must be performed to remove the 
systematic errors. This calibration procedure is done by applying a fixed DC voltage to the analog 
input of the Wilkinson ADC. By stepping through different fixed DC voltages within the ADC 
digitizing dynamic range, the average PROl output is mapped as a function of voltage. From these 
calibration measurements, a look-up table can be generated to convert the PROl output into voltage 
for a Wilkinson conversion application. 

The method of using time interpolation by digital delay lines for boosted Wilkinson conver- 
sion has been demonstrated to achieve TDC resolution as low as 20 ps [jsj. While the PROl TDC 
resolution is roughly 20 times courser and slower than the digital delay lines method, the PROl 
method is useful in applications with a fast readout duty cycle using deep SCA storage for buffer- 
ing during the readout deadtime. Since the PROl TDC method doesn't require an external clock 
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reference, the stored analog voltage on the SCA has reduced coupling of switching noise from the 
absents of a reference TDC clock. 

6. Future Directions 

Demonstration of the efficacy of this technique has led to its adoption in the 2nd generation of 
large, buffered analog storage device for precision timing ASIC (BLAB2). It will also be featured 
in a device intendend for continuous monitoring of turn-by-turn x-ray emission of high luminosity 
electron storage rings (STURM). In both cases the low-power and faster conversion speed improve 
the density and reduce processing speed and overall readout system overhead, essential for future 
mega-channel readout systems. 

7. Summary 

A first generation of fast Wilkinson encoder CMOS device has been studied in a 0.25 jUm process. 
This architecture is optimized to reduce deadtime and power consumption while operating at an 
effective multi-GHz digital counter rate for fast Wilkinson conversion. Demonstrated low-power 
and high timing resolution makes this architecture ideal for integrating a data collection FPGA with 
a SCA waveform sampling ASIC, while reducing the amount of FPGA resources needed. 
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